Perf: Optimize comparison kernels for inlined views #7731

zhuqi-lucas · 2025-06-21T15:23:46Z

Which issue does this PR close?

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

Closes #7621

Rationale for this change

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

What changes are included in this PR?

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

Are there any user-facing changes?

No

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

zhuqi-lucas · 2025-06-21T15:24:34Z

The performance benchmark result,

the lt scalar StringViewArray shows more than 30% performance faster, and no obvious regression for all string view testing.

lt scalar StringViewArray                                                                                1.00     20.9±1.35ms        ? ?/sec    1.31     27.4±1.23ms        ? ?/sec

critcmp  main fast_path_view --filter "iew"
group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.01  1257.0±18.64µs        ? ?/sec    1.00  1249.7±10.10µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00  1235.9±18.20µs        ? ?/sec    1.00  1230.0±16.30µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.03   697.8±32.61µs        ? ?/sec    1.00    677.9±8.49µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01    613.6±9.26µs        ? ?/sec    1.00   605.7±11.02µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      4.2±0.05ms        ? ?/sec    1.06      4.4±0.10ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.01   510.0±13.43µs        ? ?/sec    1.00    503.6±7.12µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      4.0±0.03ms        ? ?/sec    1.01      4.0±0.04ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.05      4.2±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.05      4.2±0.06ms        ? ?/sec
like_utf8view scalar complex                                                                             1.00     82.6±0.54ms        ? ?/sec    1.00     82.8±0.89ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.7±0.56ms        ? ?/sec    1.01     77.2±0.63ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     19.0±0.18ms        ? ?/sec    1.00     19.0±0.30ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     19.1±0.22ms        ? ?/sec    1.01     19.3±0.19ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.01     19.2±0.27ms        ? ?/sec    1.00     19.1±0.27ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     14.1±0.13ms        ? ?/sec    1.00     14.1±0.13ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.01     18.6±0.58ms        ? ?/sec    1.00     18.5±0.61ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.8±0.15ms        ? ?/sec    1.01     11.0±0.22ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.7±0.34ms        ? ?/sec    1.00     18.8±0.41ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00    619.0±3.76µs        ? ?/sec    1.00    616.8±3.57µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.02  1857.3±74.05µs        ? ?/sec    1.00  1815.2±22.35µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    609.7±7.74µs        ? ?/sec    1.00    611.9±6.24µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.01    195.6±3.62µs        ? ?/sec    1.00    193.5±3.08µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.04   695.1±82.83µs        ? ?/sec    1.00    671.2±6.65µs        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00   491.0±11.87µs        ? ?/sec    1.00    489.8±5.67µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     20.9±1.35ms        ? ?/sec    1.31     27.4±1.23ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.04   516.6±19.16µs        ? ?/sec    1.00    499.0±6.16µs        ? ?/sec

zhuqi-lucas · 2025-06-21T15:46:02Z

Hi @Dandandan
The fuzz testing failed, i am not sure if we can compare the view directly.

Based on what we’re seeing, it looks like directly comparing the raw u128 views can violate true lexicographic ordering under little‑endian layout—fuzz has caught cases where padding and byte‑order make u128.cmp() pick the wrong winner.

Dandandan · 2025-06-21T18:31:20Z

Hi @Dandandan The fuzz testing failed, i am not sure if we can compare the view directly.

Based on what we’re seeing, it looks like directly comparing the raw u128 views can violate true lexicographic ordering under little‑endian layout—fuzz has caught cases where padding and byte‑order make u128.cmp() pick the wrong winner.

I think we have to make sure only to compare the inlined string, not the entire u128/view value!

Dandandan · 2025-06-21T20:47:31Z

I think for it to work we probably should:

Add some benchmarks for arrays with only inlined string views (12 bytes or smaller)
Change the implementation of comparison views to not be on individual level, but on array level - so outside of the loop we can decide on whether to use the fast path or not.

zhuqi-lucas · 2025-06-22T04:52:37Z

Thank you @Dandandan , addressed in latest PR, and add benchmark, the result is amazing, 3.x faster and 10x faster.

critcmp  main fast_path_view --filter "inlined bytes"
group                                               fast_path_view                         main
-----                                               --------------                         ----
eq StringViewArray StringViewArray inlined bytes    1.00      3.7±0.09ms        ? ?/sec    3.33     12.3±0.15ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes    1.00      5.2±0.13ms        ? ?/sec    9.80     51.3±1.76ms        ? ?/sec

alamb · 2025-06-22T12:13:37Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (c089841) to 1ededfe diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=StringView
BENCH_BRANCH_NAME=fast_path_view
Results will be posted here when complete

alamb

This code looks good to me -- thank you @zhuqi-lucas

I am rerunning the benchmarks to double check performance but it looks great

alamb · 2025-06-22T12:11:04Z

arrow-ord/src/cmp.rs

+            let r_bytes = unsafe { GenericByteViewArray::<T>::inline_value(r_view, 12) };
+            return l_bytes.cmp(r_bytes).is_eq();
+        }
+
        // # Safety
        // The index is within bounds as it is checked in value()


nit: I think it would be good to move the Safety comment up along with the calls to unsafe

Thank you @alamb for review, addressed it in latest PR.

alamb · 2025-06-22T12:19:18Z

🤖: Benchmark completed

Details

group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.03      2.9±0.04ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      3.0±0.03ms        ? ?/sec    1.02      3.0±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.03      2.3±0.06ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.00      2.2±0.04ms        ? ?/sec    1.01      2.2±0.07ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     25.0±0.24ms        ? ?/sec    1.00     24.9±0.24ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     23.6±0.14ms        ? ?/sec  
eq long same prefix strings StringViewArray                                                              1.00    921.5±5.29µs        ? ?/sec    1.00    923.2±6.22µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.06     17.7±0.08ms        ? ?/sec    1.00     16.8±0.15ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.06     17.9±0.18ms        ? ?/sec    1.00     17.0±0.15ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.06     17.9±0.12ms        ? ?/sec    1.00     17.0±0.10ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     24.6±0.25ms        ? ?/sec  
lt long same prefix strings StringViewArray                                                              1.02    899.1±3.97µs        ? ?/sec    1.00    881.0±4.03µs        ? ?/sec
lt scalar StringViewArray                                                                                1.01     58.3±0.11ms        ? ?/sec    1.00     57.7±0.11ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    918.3±4.04µs        ? ?/sec    1.01    925.2±4.53µs        ? ?/sec

Dandandan · 2025-06-22T12:21:38Z

arrow-ord/src/cmp.rs

@@ -566,12 +566,18 @@ impl<'a, T: ByteViewType> ArrayOrd for &'a GenericByteViewArray<T> {
    type Item = (&'a GenericByteViewArray<T>, usize);

    fn is_eq(l: Self::Item, r: Self::Item) -> bool {
+        let l_view = unsafe { l.0.views().get_unchecked(l.1) };
+        let r_view = unsafe { r.0.views().get_unchecked(r.1) };
+        if l.0.data_buffers().is_empty() && r.0.data_buffers().is_empty() {


I think the code probably depends on inlining / moving this check outside of the loop?
Should we add #[inline(always)] to it to make sure it keeps doing that? 🤔

Thank you @Dandandan for this good suggestion, i addressed it in latest PR.

I add the #[inline(always)] to eq(), lt() and compare_byte_view(), i am sure if it's the right way.

Dandandan · 2025-06-22T12:29:35Z

My results match the 3x / 10x improvement 🚀

eq StringViewArray StringViewArray inlined bytes
                        time:   [6.5763 ms 6.6128 ms 6.6497 ms]
                        change: [−62.258% −62.065% −61.859%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

lt StringViewArray StringViewArray inlined bytes
                        time:   [7.9021 ms 7.9405 ms 7.9766 ms]
                        change: [−89.725% −89.674% −89.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  7 (7.00%) low mild

alamb · 2025-06-22T12:33:29Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (dddf67a) to 1ededfe diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=StringView
BENCH_BRANCH_NAME=fast_path_view
Results will be posted here when complete

Dandandan · 2025-06-22T12:35:21Z

arrow-ord/src/cmp.rs

+    if left.data_buffers().is_empty() && right.data_buffers().is_empty() {
+        // Only need to compare the inlined bytes
+        let l_bytes = unsafe {
+            GenericByteViewArray::<T>::inline_value(left.views().get_unchecked(left_idx), 12)


Hmm I think this might not be correct for > / < in case of zeros in the values (e.g. https://en.wikipedia.org/wiki/Null_character).
I think we should add the length to the comparison (or do something smarter? I am not sure if that's possible).

This returns greater (instead of Equal):

> println!("{:?}", "\0".cmp("")) > Greater

Good point @Dandandan , let me find a good way to handle it!

Dandandan

Promising results!
I think we need some test to show it is correct with null characters and different lengths, I think we need to modify the implementation a bit.

alamb · 2025-06-22T12:39:00Z

🤖: Benchmark completed

Details

group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00      2.9±0.04ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      3.0±0.04ms        ? ?/sec    1.02      3.0±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00      2.2±0.05ms        ? ?/sec    1.03      2.3±0.08ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01      2.2±0.05ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     24.9±0.25ms        ? ?/sec    1.00     24.9±0.27ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     23.9±0.25ms        ? ?/sec  
eq long same prefix strings StringViewArray                                                              1.00    919.5±5.05µs        ? ?/sec    1.00    920.2±4.66µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.06     17.6±0.09ms        ? ?/sec    1.00     16.7±0.16ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.04     17.8±0.17ms        ? ?/sec    1.00     17.1±0.09ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.05     17.8±0.13ms        ? ?/sec    1.00     16.9±0.11ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     24.6±0.28ms        ? ?/sec  
lt long same prefix strings StringViewArray                                                              1.03    907.7±8.25µs        ? ?/sec    1.00    879.4±4.85µs        ? ?/sec
lt scalar StringViewArray                                                                                1.01     58.3±0.16ms        ? ?/sec    1.00     57.7±0.17ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    915.7±4.42µs        ? ?/sec    1.01    921.6±4.44µs        ? ?/sec

zhuqi-lucas · 2025-06-22T14:26:13Z

Promising results! I think we need some test to show it is correct with null characters and different lengths, I think we need to modify the implementation a bit.

Thank you @Dandandan , updated the code, and we still can gain some performance improvement for latest code.

zhuqi-lucas · 2025-06-22T14:42:22Z

@Dandandan @alamb
The benchmark result, i run locally for latest PR:

critcmp  main fast_path_view --filter "iew"
group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.01  1252.0±14.91µs        ? ?/sec    1.00  1242.8±12.91µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.01   1224.7±7.64µs        ? ?/sec    1.00  1215.0±22.97µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.01    673.7±3.22µs        ? ?/sec    1.00    670.3±2.76µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.02    620.5±7.92µs        ? ?/sec    1.00    609.3±5.52µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      4.3±0.02ms        ? ?/sec    1.10      4.8±0.03ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      8.2±0.05ms        ? ?/sec    1.57     12.8±0.10ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.04   521.7±11.72µs        ? ?/sec    1.00   500.1±12.99µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      3.8±0.04ms        ? ?/sec    1.01      3.9±0.04ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.02      4.1±0.03ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      4.0±0.08ms        ? ?/sec    1.00      4.0±0.03ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     80.5±0.69ms        ? ?/sec    1.00     80.0±0.40ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.5±0.65ms        ? ?/sec    1.00     76.7±0.63ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     18.8±0.21ms        ? ?/sec    1.01     19.1±0.46ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     18.9±0.19ms        ? ?/sec    1.00     18.9±0.27ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.01     18.9±0.23ms        ? ?/sec    1.00     18.8±0.19ms        ? ?/sec
like_utf8view scalar equals                                                                              1.02     14.3±0.62ms        ? ?/sec    1.00     14.0±0.17ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.00     18.1±0.19ms        ? ?/sec    1.01     18.2±0.12ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.5±0.10ms        ? ?/sec    1.00     10.5±0.16ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.4±0.20ms        ? ?/sec    1.00     18.4±0.15ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.01   624.2±32.90µs        ? ?/sec    1.00   615.5±13.97µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.02  1849.8±73.20µs        ? ?/sec    1.00  1813.0±19.86µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    593.1±4.95µs        ? ?/sec    1.01   598.3±13.30µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.03    195.4±5.64µs        ? ?/sec    1.00    190.2±1.29µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    671.4±5.58µs        ? ?/sec    1.01   676.8±20.18µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     47.6±1.05ms        ? ?/sec    1.11     52.8±1.38ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.06    512.5±9.08µs        ? ?/sec    1.00    482.1±5.10µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     26.8±0.92ms        ? ?/sec    1.04     28.0±1.56ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.03    521.1±9.77µs        ? ?/sec    1.00    503.7±9.93µs        ? ?/sec

Still have 57% and %11 for the inlined case:

eq StringViewArray StringViewArray inlined bytes                                                         1.00      8.2±0.05ms        ? ?/sec    1.57     12.8±0.10ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     47.6±1.05ms        ? ?/sec    1.11     52.8±1.38ms        ? ?/sec

Dandandan · 2025-06-22T15:37:01Z

arrow-ord/src/cmp.rs

        let r_len = *r_view as u32;
        // This is a fast path for equality check.
        // We don't need to look at the actual bytes to determine if they are equal.
        if l_len != r_len {
            return false;
        }

+        // When both len are same, we can compare the inlined bytes, this can handle the corner case after


we can also compare earlier based on actual length instead like in the other place?

i am also not sure comparing length here is really a fast path, because slice equality also checks on slice first before accessing the element.

Actually I think we can do the eq case on the u128 value directly @zhuqi-lucas (compare both length and value to be equal)

I agree with you @Dandandan , thank you for catching this! Addressed in latest PR.

After this change, eq optimize a lot, 4.75 faster.

critcmp main fast_path_view --filter "inlined bytes" group fast_path_view main ----- -------------- ---- eq StringViewArray StringViewArray inlined bytes 1.00 2.7±0.01ms ? ?/sec 4.75 12.8±0.10ms ? ?/sec lt StringViewArray StringViewArray inlined bytes 1.00 47.3±0.55ms ? ?/sec 1.12 52.8±1.38ms ? ?/sec

zhuqi-lucas · 2025-06-23T10:08:13Z

Similar changes for datafusion side, it shows good result for Q11 sort-tpch which is mostly using inlined bytes to compare.

apache/datafusion#16509

Dandandan · 2025-06-23T10:16:26Z

arrow-ord/src/cmp.rs

+            let r_view = unsafe { r.0.views().get_unchecked(r.1) };
+            let l_len = *l_view as u32 as usize;
+            let r_len = *r_view as u32 as usize;
+            let l_bytes = unsafe { GenericByteViewArray::<T>::inline_value(l_view, l_len) };


Perhaps the main thing I see is what we could do next is do inline_value on a u128 level rather than &[u8] and compare the values on u128 and length if equal.

Thank you @Dandandan , interesting, i will investigate this idea!

Dandandan · 2025-06-23T11:11:23Z

All in all it's good, I think we can improve it further, I'll let you decide @zhuqi-lucas if you want to do it in this PR or later!

I think we can also improve the non-inlined case with prefixes in the views, I will create an issue about it later.

zhuqi-lucas · 2025-06-23T11:14:02Z

Thank you @Dandandan , let's create the issues for further improvement, i am willing to investigate and take the related issues.

zhuqi-lucas added 2 commits June 21, 2025 23:06

Fast path for string view compare

4bc4122

Merge remote-tracking branch 'upstream/main' into fast_path_view

0e4a6cd

github-actions bot added the arrow Changes to the arrow crate label Jun 21, 2025

fmt

8a552a4

zhuqi-lucas changed the title ~~Fast path view~~ Perf: Optimize comparison kernels for inlined views Jun 21, 2025

zhuqi-lucas added 3 commits June 22, 2025 11:44

Address comments

399297f

clippy

e9b5e44

add benchmark

c089841

alamb approved these changes Jun 22, 2025

View reviewed changes

Dandandan reviewed Jun 22, 2025

View reviewed changes

address comments

dddf67a

Dandandan reviewed Jun 22, 2025

View reviewed changes

Dandandan requested changes Jun 22, 2025

View reviewed changes

alamb self-requested a review June 22, 2025 12:42

new code

4fa53b5

Merge remote-tracking branch 'upstream/main' into fast_path_view

5eec94b

Dandandan reviewed Jun 22, 2025

View reviewed changes

address comments

fb82e8b

Dandandan approved these changes Jun 23, 2025

View reviewed changes

zhuqi-lucas mentioned this pull request Jun 23, 2025

Perf: Optimize CursorValues compare performance for StringViewArray apache/datafusion#16508

Closed

Dandandan reviewed Jun 23, 2025

View reviewed changes

Dandandan merged commit 6d01dd9 into apache:main Jun 23, 2025
23 checks passed

Dandandan mentioned this pull request Jun 23, 2025

Use prefix first for comparisons, resort to data buffer for remaining data on equal values #7744

Open

alamb mentioned this pull request Jun 23, 2025

Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) apache/datafusion#16509

Merged

Perf: Optimize comparison kernels for inlined views #7731

Perf: Optimize comparison kernels for inlined views #7731

Uh oh!

Conversation

zhuqi-lucas commented Jun 21, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuqi-lucas commented Jun 21, 2025

Uh oh!

Dandandan commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Jun 21, 2025

Uh oh!

zhuqi-lucas commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Jun 22, 2025

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 22, 2025

Uh oh!

zhuqi-lucas commented Jun 22, 2025

Uh oh!

zhuqi-lucas commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas commented Jun 23, 2025

Uh oh!

Dandandan Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

zhuqi-lucas commented Jun 21, 2025 •

edited

Loading

Dandandan commented Jun 21, 2025 •

edited

Loading

zhuqi-lucas Jun 22, 2025 •

edited

Loading

zhuqi-lucas commented Jun 22, 2025 •

edited

Loading

Dandandan Jun 22, 2025 •

edited

Loading

Dandandan Jun 23, 2025 •

edited

Loading

Dandandan commented Jun 23, 2025 •

edited

Loading